11 research outputs found
Recommended from our members
Conspiracy in the Time of Corona: Automatic detection of Emerging Covid-19 Conspiracy Theories in Social Media and the News
Abstract
Rumors and conspiracy theories thrive in environments of low confi- dence and low trust. Consequently, it is not surprising that ones related to the Covid-19 pandemic are proliferating given the lack of scientific consensus on the virus’s spread and containment, or on the long term social and economic ramifications of the pandemic. Among the stories currently circulating are ones suggesting that the 5G telecommunication network activates the virus, that the pandemic is a hoax perpetrated by a global cabal, that the virus is a bio-weapon released deliberately by the Chinese, or that Bill Gates is using it as cover to launch a broad vaccination program to facilitate a global surveillance regime. While some may be quick to dismiss these stories as having little impact on real-world behavior, recent events including the destruction of cell phone towers, racially fueled attacks against Asian Americans, demonstrations espousing resistance to public health orders, and wide-scale defiance of scientifically sound public mandates such as those to wear masks and practice social distancing, countermand such conclusions. Inspired by narrative theory, we crawl social media sites and news reports and, through the application of automated machine-learning methods, discover the underlying narrative frame- works supporting the generation of rumors and conspiracy theories. We show how the various narrative frameworks fueling these stories rely on the alignment of otherwise disparate domains of knowledge, and consider how they attach to the broader reporting on the pandemic. These alignments and attachments, which can be monitored in near real-time, may be useful for identifying areas in the news that are particularly vulnerable to reinterpretation by conspiracy theorists. Understanding the dynamics of storytelling on social media and the narrative frameworks that provide the generative basis for these stories may also be helpful for devising methods to disrupt their spread
Novel scaling law governing stock price dynamics
A stock market is typically modeled as a complex system where the purchase,
holding or selling of individual stocks affects other stocks in nonlinear and
collaborative ways that cannot be always captured using succinct models. Such
complexity arises due to several latent and confounding factors, such as
variations in decision making because of incomplete information, and differing
short/long-term objectives of traders. While few emergent phenomena such as
seasonality and fractal behaviors in individual stock price data have been
reported, universal scaling laws that apply collectively to the market are
rare. In this paper, we consider the market-mode adjusted pairwise correlations
of returns over different time scales (), , and discover
two such novel emergent phenomena: (i) the standard deviation of the
's scales as , for larger than a certain
return horizon, , where is the scaling exponent, (ii)
moreover, the scaled and zero-shifted distributions of the 's
are invariant of . Our analysis of S\&P500 market data collected
over almost years () demonstrates that the twin scaling
property holds for each year and across decades (orders of magnitude) of
. Moreover, we find that the scaling exponent provides a
summary view of market volatility: in years marked by unprecedented financial
crises -- for example and -- values of are
substantially higher. As for analytical modeling, we demonstrate that such
scaling behaviors observed in data cannot be explained by existing theoretical
frameworks such as the single- and multi-factor models. To close this gap, we
introduce a promising agent-based model -- inspired by literature on swarming
-- that displays more of the emergent behaviors exhibited by the real market
data.Comment: 45 page
Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media
Social media is a breeding ground for threat narratives and related
conspiracy theories. In these, an outside group threatens the integrity of an
inside group, leading to the emergence of sharply defined group identities:
Insiders -- agents with whom the authors identify and Outsiders -- agents who
threaten the insiders. Inferring the members of these groups constitutes a
challenging new NLP task: (i) Information is distributed over many
poorly-constructed posts; (ii) Threats and threat agents are highly contextual,
with the same post potentially having multiple agents assigned to membership in
either group; (iii) An agent's identity is often implicit and transitive; and
(iv) Phrases used to imply Outsider status often do not follow common negative
sentiment patterns. To address these challenges, we define a novel
Insider-Outsider classification task. Because we are not aware of any
appropriate existing datasets or attendant models, we introduce a labeled
dataset (CT5K) and design a model (NP2IO) to address this task. NP2IO leverages
pretrained language modeling to classify Insiders and Outsiders. NP2IO is shown
to be robust, generalizing to noun phrases not seen during training, and
exceeding the performance of non-trivial baseline models by .Comment: ACL 2022: 60th Annual Meeting of the Association for Computational
Linguistics 8+4 pages, 6 figure
Embed-Search-Align: DNA Sequence Alignment using Transformer Models
DNA sequence alignment involves assigning short DNA reads to the most
probable locations on an extensive reference genome. This process is crucial
for various genomic analyses, including variant calling, transcriptomics, and
epigenomics. Conventional methods, refined over decades, tackle this challenge
in two steps: genome indexing followed by efficient search to locate likely
positions for given reads. Building on the success of Large Language Models
(LLM) in encoding text into embeddings, where the distance metric captures
semantic similarity, recent efforts have explored whether the same Transformer
architecture can produce numerical representations for DNA sequences. Such
models have shown early promise in tasks involving classification of short DNA
sequences, such as the detection of coding vs non-coding regions, as well as
the identification of enhancer and promoter sequences. Performance at sequence
classification tasks does not, however, translate to sequence alignment, where
it is necessary to conduct a genome-wide search to successfully align every
read. We address this open problem by framing it as an Embed-Search-Align task.
In this framework, a novel encoder model DNA-ESA generates representations of
reads and fragments of the reference, which are projected into a shared vector
space where the read-fragment distance is used as surrogate for alignment. In
particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised
training of DNA sequence representations, facilitating rich sequence-level
embeddings, and (2) a DNA vector store to enable search across fragments on a
global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a
human reference genome of 3 gigabases (single-haploid), far exceeds the
performance of 6 recent DNA-Transformer model baselines and shows task transfer
across chromosomes and species.Comment: 17 pages, Tables 5, Figures 5, Under review, ICL
An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com
Reader reviews of literary fiction on social media, especially those in
persistent, dedicated forums, create and are in turn driven by underlying
narrative frameworks. In their comments about a novel, readers generally
include only a subset of characters and their relationships, thus offering a
limited perspective on that work. Yet in aggregate, these reviews capture an
underlying narrative framework comprised of different actants (people, places,
things), their roles, and interactions that we label the "consensus narrative
framework". We represent this framework in the form of an actant-relationship
story graph. Extracting this graph is a challenging computational problem,
which we pose as a latent graphical model estimation problem. Posts and reviews
are viewed as samples of sub graphs/networks of the hidden narrative framework.
Inspired by the qualitative narrative theory of Greimas, we formulate a
graphical generative Machine Learning (ML) model where nodes represent actants,
and multi-edges and self-loops among nodes capture context-specific
relationships. We develop a pipeline of interlocking automated methods to
extract key actants and their relationships, and apply it to thousands of
reviews and comments posted on Goodreads.com. We manually derive the ground
truth narrative framework from SparkNotes, and then use word embedding tools to
compare relationships in ground truth networks with our extracted networks. We
find that our automated methodology generates highly accurate consensus
narrative frameworks: for our four target novels, with approximately 2900
reviews per novel, we report average coverage/recall of important relationships
of > 80% and an average edge detection rate of >89\%. These extracted narrative
frameworks can generate insight into how people (or classes of people) read and
how they recount what they have read to others
Mapping dreams in a computational space
This article demonstrates that an automated system of linguistic analysis can be developed – the Oneirograph – to analyze large collections of dreams and computationally map their contents in terms of typical situations involving an interplay of characters, activities, and settings. Focusing the analysis first on the twin situations of fighting and fleeing, the results provide densely detailed empirical evidence of the underlying semantic structures of typical dreams. The results also indicate that the Oneirograph analytic system can be applied to other typical dream situations as well (e.g., flying, falling), each of which can be computationally mapped in terms of a distinctive constellation of characters, activities, and settings
Modelling social readers: novel tools for addressing reception from online book reviews
Social reading sites offer an opportunity to capture a segment of readers' responses to literature, while data-driven analysis of these responses can provide new critical insight into how people 'read'. Posts discussing an individual book on the social reading site, Goodreads, are referred to as 'reviews', and consist of summaries, opinions, quotes or some mixture of these. Computationally modelling these reviews allows one to discover the non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit sequencing of various subplots and readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader-generated shared narrative model. Using a corpus of reviews of five popular novels, we discover readers' distillation of the novels' main storylines and their sequencing, as well as the readers' varying impressions of characters in the novel. In so doing, we make three important contributions to the study of infinite-vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from reviews, and (iii) an 'impressions' algorithm, SENT2IMP, that provides multi-modal insight into readers' opinions of characters
Recommended from our members
Conspiracy in the Time of Corona: Automatic detection of Emerging Covid-19 Conspiracy Theories in Social Media and the News
Abstract
Rumors and conspiracy theories thrive in environments of low confi- dence and low trust. Consequently, it is not surprising that ones related to the Covid-19 pandemic are proliferating given the lack of scientific consensus on the virus’s spread and containment, or on the long term social and economic ramifications of the pandemic. Among the stories currently circulating are ones suggesting that the 5G telecommunication network activates the virus, that the pandemic is a hoax perpetrated by a global cabal, that the virus is a bio-weapon released deliberately by the Chinese, or that Bill Gates is using it as cover to launch a broad vaccination program to facilitate a global surveillance regime. While some may be quick to dismiss these stories as having little impact on real-world behavior, recent events including the destruction of cell phone towers, racially fueled attacks against Asian Americans, demonstrations espousing resistance to public health orders, and wide-scale defiance of scientifically sound public mandates such as those to wear masks and practice social distancing, countermand such conclusions. Inspired by narrative theory, we crawl social media sites and news reports and, through the application of automated machine-learning methods, discover the underlying narrative frame- works supporting the generation of rumors and conspiracy theories. We show how the various narrative frameworks fueling these stories rely on the alignment of otherwise disparate domains of knowledge, and consider how they attach to the broader reporting on the pandemic. These alignments and attachments, which can be monitored in near real-time, may be useful for identifying areas in the news that are particularly vulnerable to reinterpretation by conspiracy theorists. Understanding the dynamics of storytelling on social media and the narrative frameworks that provide the generative basis for these stories may also be helpful for devising methods to disrupt their spread